random neural network
On the Sparsity of the Strong Lottery Ticket Hypothesis
Considerable research efforts have recently been made to show that a random neural network $N$ contains subnetworks capable of accurately approximating any given neural network that is sufficiently smaller than $N$, without any training. This line of research, known as the Strong Lottery Ticket Hypothesis (SLTH), was originally motivated by the weaker Lottery Ticket Hypothesis, which states that a sufficiently large random neural network $N$ contains sparse subnetworks that can be trained efficiently to achieve performance comparable to that of training the entire network $N$.Despite its original motivation, results on the SLTH have so far not provided any guarantee on the size of subnetworks.Such limitation is due to the nature of the main technical tool leveraged by these results, the Random Subset Sum (RSS) Problem.Informally, the RSS Problem asks how large a random i.i.d.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
RandNet-Parareal: a time-parallel PDE solver using Random Neural Networks
Parallel-in-time (PinT) techniques have been proposed to solve systems of time-dependent differential equations by parallelizing the temporal domain. Among them, Parareal computes the solution sequentially using an inaccurate (fast) solver, and then corrects'' it using an accurate (slow) integrator that runs in parallel across temporal subintervals. This work introduces RandNet-Parareal, a novel method to learn the discrepancy between the coarse and fine solutions using random neural networks (RandNets). RandNet-Parareal achieves speed gains up to x125 and x22 compared to the fine solver run serially and Parareal, respectively. Beyond theoretical guarantees of RandNets as universal approximators, these models are quick to train, allowing the PinT solution of partial differential equations on a spatial mesh of up to 10 5 points with minimal overhead, dramatically increasing the scalability of existing PinT approaches.
Critical Points of Random Neural Networks
This work investigates the expected number of critical points of random neural networks with different activation functions as the depth increases in the infinite-width limit. Under suitable regularity conditions, we derive precise asymptotic formulas for the expected number of critical points of fixed index and those exceeding a given threshold. Our analysis reveals three distinct regimes depending on the value of the first derivative of the covariance evaluated at 1: the expected number of critical points may converge, grow polynomially, or grow exponentially with depth. The theoretical predictions are supported by numerical experiments. Moreover, we provide numerical evidence suggesting that, when the regularity condition is not satisfied (e.g. for neural networks with ReLU as activation function), the number of critical points increases as the map resolution increases, indicating a potential divergence in the number of critical points.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Lazio > Rome (0.04)
Emergence of Structure in Ensembles of Random Neural Networks
Muscarnera, Luca, Loreti, Luigi, Todeschini, Giovanni, Fumagalli, Alessio, Regazzoni, Francesco
Randomness is ubiquitous in many applications across data science and machine learning. Remarkably, systems composed of random components often display emergent global behaviors that appear deterministic, manifesting a transition from microscopic disorder to macroscopic organization. In this work, we introduce a theoretical model for studying the emergence of collective behaviors in ensembles of random classifiers. We argue that, if the ensemble is weighted through the Gibbs measure defined by adopting the classification loss as an energy, then there exists a finite temperature parameter for the distribution such that the classification is optimal, with respect to the loss (or the energy). Interestingly, for the case in which samples are generated by a Gaussian distribution and labels are constructed by employing a teacher perceptron, we analytically prove and numerically confirm that such optimal temperature does not depend neither on the teacher classifier (which is, by construction of the learning problem, unknown), nor on the number of random classifiers, highlighting the universal nature of the observed behavior. Experiments on the MNIST dataset underline the relevance of this phenomenon in high-quality, noiseless, datasets. Finally, a physical analogy allows us to shed light on the self-organizing nature of the studied phenomenon.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
Fractal and Regular Geometry of Deep Neural Networks
Di Lillo, Simmaco, Marinucci, Domenico, Salvi, Michele, Vigogna, Stefano
We study the geometric properties of random neural networks by investigating the boundary volumes of their excursion sets for different activation functions, as the depth increases. More specifically, we show that, for activations which are not very regular (e.g., the Heaviside step function), the boundary volumes exhibit fractal behavior, with their Hausdorff dimension monotonically increasing with the depth. On the other hand, for activations which are more regular (e.g., ReLU, logistic and $\tanh$), as the depth increases, the expected boundary volumes can either converge to zero, remain constant or diverge exponentially, depending on a single spectral parameter which can be easily computed. Our theoretical results are confirmed in some numerical experiments based on Monte Carlo simulations.
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (3 more...)
Solving stochastic partial differential equations using neural networks in the Wiener chaos expansion
Neufeld, Ariel, Schmocker, Philipp
In this paper, we solve stochastic partial differential equations (SPDEs) numerically by using (possibly random) neural networks in the truncated Wiener chaos expansion of their corresponding solution. Moreover, we provide some approximation rates for learning the solution of SPDEs with additive and/or multiplicative noise. Finally, we apply our results in numerical examples to approximate the solution of three SPDEs: the stochastic heat equation, the Heath-Jarrow-Morton equation, and the Zakai equation.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- (9 more...)
Nonlinear random matrix theory for deep learning
Jeffrey Pennington, Pratik Worah
Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
Random ReLU Neural Networks as Non-Gaussian Processes
Parhi, Rahul, Bohra, Pakshal, Biari, Ayoub El, Pourya, Mehrsa, Unser, Michael
We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).
- North America > United States > New York (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (4 more...)
Full error analysis of the random deep splitting method for nonlinear parabolic PDEs and PIDEs with infinite activity
Neufeld, Ariel, Schmocker, Philipp, Wu, Sizhou
In this paper, we present a randomized extension of the deep splitting algorithm introduced in [Beck, Becker, Cheridito, Jentzen, and Neufeld (2021)] using random neural networks suitable to approximately solve both high-dimensional nonlinear parabolic PDEs and PIDEs with jumps having (possibly) infinite activity. We provide a full error analysis of our so-called random deep splitting method. In particular, we prove that our random deep splitting method converges to the (unique viscosity) solution of the nonlinear PDE or PIDE under consideration. Moreover, we empirically analyze our random deep splitting method by considering several numerical examples including both nonlinear PDEs and nonlinear PIDEs relevant in the context of pricing of financial derivatives under default risk. In particular, we empirically demonstrate in all examples that our random deep splitting method can approximately solve nonlinear PDEs and PIDEs in 10'000 dimensions within seconds.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (4 more...)
- Banking & Finance > Risk Management (0.34)
- Banking & Finance > Credit (0.34)